Efficient Processing of Distributed Iceberg Semi-joins

نویسندگان

  • Mohammed Kasim Imthiyaz
  • Dong Xiaoan
  • Panos Kalnis
چکیده

The Iceberg SemiJoin (ISJ) of two datasets R and S returns the tuples in R which join with at least k tuples of S. The ISJ operator is essential in many practical applications including OLAP, Data Mining and Information Retrieval. In this paper we consider the distributed evaluation of Iceberg SemiJoins, where R and S reside on remote servers. We developed an efficient algorithm which employs Bloom filters. The novelty of our approach is that we interleave the evaluation of the Iceberg set in server S with the pruning of unmatched tuples in server R. Therefore, we are able to (i) eliminate unnecessary tuples early, and (ii) extract accurate Bloom filters from the intermediate hash tables which are constructed during the generation of the Iceberg set. Compared to conventional two-phase approaches, our experiments demonstrate that our method transmits up to 80% less data through the network, while reducing the disk I/O cost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Processing Inequality Queries

Bernstein and Goodman showed that natural inequality ( NI) queries can be processed efficiently by semijoins, if there are no multiple inequality join edges, nor cycles with one or zero doublet. In this paper procedures to hand1 e these cases efficiently are given. Multiple inequality join edges can be processed by multi-attribute inequality semijoins. Two procedures based on generalized semi-j...

متن کامل

Analysis of Joins and Semi Joins in a Distributed Database Query

Database is defined as collection of files or table, where as DBMS stands for Database Management System which is collection of unified programs used to manage overall activities of the database. The two dominant approaches used for storing and managing database are centralized database management system and distributed database management system in which data is placed at central location and ...

متن کامل

Using Remote Joins for the Processing of Distributed Mobile Queries

The query processing in a mobile computing environment involves join processing among different sites which include static servers and mobile computers. In this paper, we first present some unique features of a mobile environment, and then, in light of these features, devise query processing methods for both join and query processing. Remote mobile joins are said to be effectual if they are, wh...

متن کامل

Efficient Iceberg Query Processing in Sensor Networks

The iceberg query finds data whose aggregate values exceed a pre-specified threshold. To process an iceberg query in sensor networks, all sensor data have to be aggregated and then sensor data whose aggregate values are smaller than the threshold are eliminated. Whether a certain sensor datum is in the query result depends on the other sensor data values. Since sensor nodes are distributed, com...

متن کامل

Distributed Query Processing in the Internet: Exploring Relation Replication and Network Characteristics

We introduce the concept of network graph for distributed query processing. Semijoins and joins are termed contributive replicated semijoins and contributive replicated joins, respectively, when they are interleaved into a join sequence to reduce the amount of data transmission cost required in a network with replicated relations. Our solution procedure consists of three consecutive steps, name...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004